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Abstract 

Non-parametric estimation of a convex discrete distribution may be of in- 
terest in several applications, such as the estimation of species abundance 
distribution in ecology. In this paper we study the least squares estimator of 
a discrete distribution under the constraint of convexity. We show that this 
estimator exists and is unique, and that it always outperforms the classical 
empirical estimator in terms of the £2-distance. We provide an algorithm for 
its computation, based on the support reduction algorithm. We compare its 
performance to those of the empirical estimator, on the basis of a simulation 
study. 
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1. Introduction 

The nonparametric estimation, based on the observation of n i.i.d. copies 
Xi, . . . , Xn, of the distribution of a continuous random variable under a 
monotonicit y constraint, has received a gre at deal of attention in the past 



decades, see iBalabdaoui and Wellnerl ( l2005l ) for a review. The most stud 



ied constraint is the monotonicity of the density function. It is well-known 
that the nonparametric maximum likelihood estimator of a decreasing density 
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function over [0, oo) is the Grenander estimator defined as the left-continuous 
slope of the least concave majorant of the empirical distribution function of 
Xi, . . . , Xn- This estimator can be easily implemented using the PAVA (poo l 



adjacent violators algorithm) or a similar device, see iBarlow et al.l f ll972l ) 



Another well studied constraint is the monotonicity of the first derivative 
of the density, such that the density function is assumed to be convex (or 



concave) over a given interval. It was shown by iGroeneboom et al.l (j200l[ ) 



that both the least squares estimator and the nonparametric maximum like- 
lihood estimator under the convexity constraint exist and are unique. How- 
ever, although a precise characterisation of these estimators is given in that 
paper, their practical implementation is a non-trivial issue: it requires so- 
phisticated iterative algorithms that use a mixture representat i on, su ch as 



the support reduction algorithm described in Groeneboom et al. f 2008[ ). The 



nonparametric maximum likelihood of a log-concave density function (i.e., 
a density function / such that log(/) is a concave function ) was intr o duced 



Rufibachl (I20061) and algorit hmic aspects were treated in iRufibach fl2007f ) 



m 

and in iDumbgen et al.l (l2007l ). where an algorithm similar to the support 
reduction algorithm is defined. 

Recently, the problem of estimating a discrete prob ability mass function 
under a monotonicity constraint has attracted attention: jjankowski and Wellner 
(120091) considered the non-pa rametric estimation of a monotone distribution 
and iBalabdaoui et al.l (120111 ) considered the case of a log-concave distribu- 
tion. 

In this paper, we consider the nonparametric estimation of a discrete 
distribution on N under the convexity constraint. This problem has not 
yet been considered in the literature, although it has several applications, 
such as the estimation of species abundance distribution in ecology. In this 
field, the terms "nonparametric methods" often refer to finite mixtures of 
parametric distributions wher e only the mixing distr i bution is inferred in a 
nonpa r ametric way, see e.g. ( iBohning and KuhnertI ( l2006l ) , iBohning et al. 



fcoOSh . lChao and Sheni t004 ]). 

We study the least squares estimator of a discrete distribution on N under 
the constraint of convexity. First, we prove that this estimator exists and is 
unique, and that it always outperforms the classical empirical estimator in 
terms of the £2-distance. Then, we consider computational issues. Similar to 
the continuous case, we prove that a representation of convex discrete distri- 
butions can be given in terms of a - possibly infinite - mixture of triangular 
functions on N, and, based on this characterization, we derive an algorithm 
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that provides the least squares estimate, although both the number of com- 
ponents in the mixture and the support of the estimator are unknown. This 
al gorithm is an adaptation t o our problem of the support reduction algorithm 



m 



Groeneboom et al.l (|2008[ ). Finally, we assess the performance of the least 
squares estimator under the convexity constraint through a simulation study. 

The paper is organized as follows. Theoretical properties of the con- 
strained least squares estimator are given in Section |2J Section E] is devoted 
to computational issues. A similation study is reported in Section HI and the 
proofs are postponed to Section [51 

Notation.. Let us define some notation that will be used throughout the 
paper 

• /C is the set of convex functions / on N such that limj_j,oo f{i) = 0. We 
recall that a discrete function / : N — M is convex if and only if it 
satisfies 

+ (1) 
for all i and j in N, or equivalently, if and only if 

/(z)-/(z-l) ^/(z + l)-/(z) (2) 

for all z ^ 1. In particular, any f E JC has to be non-negative, non- 
increasing and strictly decreasing on its support. 

• C is the set of all convex probability mass functions on N, i.e., the set 
of functions / G /C satisfying J2i^o /(^) ~ 



2. The constrained LSE of a convex discrete distribution 

2.1. The main result 

Suppose that we observe n i.i.d. random variables Xi, . . . ,Xn that take 
values in N, and that the common probability mass function po of these vari- 
ables is convex on N with an unknown support. Based on these observations, 
we aim to build an estimator of po that satisfies the convexity constraint. 

For this task, define the empirical estimator pn of po by 

1 " 
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for all j E N, and consider the criterion function 

for all functions / : N — )■ R. The empirical estimator pn may be non-convex 
so in order to build a convex estimator, we minimize the criterion function 
Qn over the set C. The minimizer (which exists according to Theorem [T] 
below) is called the constrained least squares estimator (LSE) of po because 
it also minimizes the least squares criterion 

It is clear that in the case where Pn is convex, the constrained LSE coincides 
with Pn- On the other hand, in the case where Pn is non-convex, the con- 
strained LSE outperforms the empirical estimator pvi, as detailed in Section 

The existence and uniqueness of the constrained LSE of po over C is shown 
in the following theorem. It is proved that Pn is the minimizer of Qn over 
the set K., and has a finite support. We will denote by s^, respectively s^, 
the maximum of the support of pn, respectively pn- 

Theorem 1. There exists a unique pn EC such that 

Qn{Pn) = inf Qn{p) = inf Qn{p)- 

Moreover, the support of pn is finite, and s'n ^^n- 

2.2. Comparison between constrained and unconstrained estimators 

In Theorem [21 we show the benefits of using the constrained LSE rather 
than the (unconstrained) empirical estimator pvi, in terms of the /2-loss. 
Specifically, the constrained LSE is closer to the unknown underlying distri- 
bution Po than is the unconstrained estimator p^- Moreover, we prove that 
this happens with a strictly positive probability (and even, a probability of 
at least 1/2) whenever po is not strictly convex on its support. 
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Theorem 2. Let pq, pn and Pn be defined as in Section \2.1[ We have the 
following results: 

E (^'o(j) - U3)f ^ E (^'o(^') - Pr.{3))\ (3) 

with a strict inequality ifpn is non-convex. Assume that there exist z, j G N 
such that j ^ i -\-2, po{i) > 0, and po is linear over {i, . . . ,j}. Then, 

liminfPlpvi is non- convex) ^1/2, (4) 

n— ^-oo V / 

and 



liminf P {P^U)-PnU)T < E (Po(j) -Pn(j))' ) > 1/2 



(5) 



Remark:, as we shall see in the proof of Theorem HI Equation dH]) also holds 
with Po replaced by any g G /C that belongs to I2, i.e., that satisfies < 
00. 

Now, we consider the estimation of some characteristics of the distribu- 
tion namely the expectation, the centered moments and the probability 
at 0. As estimators for these characteristics, we naturally consider similar 
caracteristics of the constrained and the unconstrained estimators. Theo- 
rem |3] states that the distributions pn and pn have the same expectation, 
but the centered moments of the distribution pn are lower than those of the 
distribution p„. In particular, the variance of the distribution of Pn is greater 
than the variance of Pn- Moreover, the constrained estimator Pn(0) is greater 
than or equal to the unconstrained estimator pvi(O). The performance of pn 
is compared with that of pvi(O) through simulation studies in Section |H 

Theorem 3. Let pn and pn he defined as in Section \2.1\ We have for all 
n ^ 1, and ^ a ^ 

Sji S fi. 

El^-a|"p„(^) ^ EK-a|"p„(2). (6) 

1=1 i=l 

Moreover, YliLiWn{i) = Z)£i^Pn(«) and pn{0) ^Pn(O). 
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It can be shown that similar results hold for constraint estimators of a convex 
density function, where Pn is replaced by an unconstrained estimator of the 
density function and p„ is replaced by the corresponding constrained esti- 
mator. On t he contrary, in the case of discrete log-concave distribution, it 



is shown by iBalabdaoui et al.l ( 120111 ). see their Equations (3.5) and (3.6), 



that the moments of the constrained maximum likelihood estimator distri- 
bution are smaller than those of the empirical distribution. These authors 
refer to similar results for the maximum likelihood estimator of a continuous 
log-concave density. 

3. Implementing the constrained LSE 

3.1. More on convex discrete functions 

The aim of this section is to prove that any / G /C is a combination of the 
triangular functions Tj defined below, and that th e combination is unique. 
This compares with Propositions 2.1 and 2.2 in Balabdaoui and Wellnerl 



( 120051 ). which deals with the case of convex (and more generally, /c-monotone) 
density functions on (0, oo). For every integer j ^ 1, we define the j-th tri- 
angular function Tj on N by 

.^ f ^^-^ ~'! for alH G {0,...,j - 1} 
I for all integers i ^ j. 



It should be noticed that Tj is normalized in such a way that it is a probability 

r all i and 



mass function, i.e., Tj(i) ^ for all i and 



i>0 



Moreover, Tj is monotone non-increasing and convex on N. Hereafter, we 
denote by Ai the convex cone of non-negative measures on N\{0}. We 
denote by Hj, for j G N\{0}, the components of tt G Ai. 

Theorem 4. Let / : N — [0, oo) such that limj^oo /(^) = 0. 
1. We have f E JC if and only if there exists vr G such that 

f{{)= n,T,{z)foralU^O. (7) 
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2. Assume f ^ K. Then, it in ^ is uniquely defined by 

^3 = ^-^^^{fU + 1) + fU - 1) - 2/(j)) for all J ^ 1. (8) 

3. Assume / G /C. Then, n is a probability measure over N\{0} if and 
only if f is a probability mass function. 

Let us note that according to ([H]), tt puts mass at point j if, and only if, 
/ changes of slope at point j. Moreover, denoting by s the maximum of the 
support of / in the case where this support is not empty, we see that the 
greatest point where / changes of slope is s + 1, since the left-hand slope of 
/ at this point, f{s + 1) — /(s), is strictly negative whereas the right-hand 
slope, f{s + 2) — f{s + 1), is zero. Therefore, in the case where the support 
of / is not empty, the greatest point where vr puts mass is s + 1. Obviously, 
in case f{j) = for all j ^ 0, we also have ttj = for all j ^ 1. 

3.2. Algorithm 

Define the criterion function 

for all vr G A^. The reason why we define such a criterion function is that 
^n(7r) = Qn{p) for all p G /C and n E Ai satisfying (171) with / replaced by p. 
The constrained LSE of po is the unique minimizer of Qn{p) over p G /C. It 
follows from Theorem H] that there exists a unique 7r„ G that minimizes 
\E'„(7r) over n E A4, and Pn and tt^ are linked by the relation 

Pn{i) = E ^rijTj{i) for all i > 0. (9) 

Therefore, computing the constrained LSE pn of po comes to computing the 
measure that minimizes ^'n(7r) over n E M.. Moreover, we know from 
Theorems [1] and H] that 7r„ is a probability measure and that its support is 
finite with the greatest point equal to + 1. 

For all L ^ 1, let A4.^ be the set of measures ti E M. such that the support 
of TT is a subset of {1, . . . , L}. It can easily be shown that for any L ^ 1, 
the minimizer of '^n{'^) over tt G exists and is unique. We denote this 
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minimizer by n^, and for any L ^ s^^ + 1, we calculate tt^ using; the support 
reduction algorithm that was proposed by iGroeneboom et al.l (120081 ) . 

Let us define the following notation. Let u, fi be two measures in Ai. The 
derivative of \E'„ in the direction u calculated in /i is defined as follows: 

[^^(*n)] (/i) = lim - (*n(/i + eu) - , 

£4,0 £ 

for all fi and u such that \E'„(/i) and \l/„(i^) are finite. It can be written as 

[D,{^^)]{fi) = J2^Ad,i'^n)]{fi) (10) 

where 

[d,{^n)]{fi) = lim^(^^„(/i + £5,)-^„(/i)) 

The algorithm for calculating tt^ for a fixed L is described as follows:. 

1. Initialisation 

Let S = {L} and choose the measure n^, such that 

vrf = for 1 ^ 7 ^ L - 1 

L-l 

vrf: = arg min ^ (p„ (i) - vtTl (i) )^ 

^ i=0 

2. Optimisation over Ai^ 

Step 1: For 1 ^ j ^ L calculate the quantities [(ij(\E'„)] (tt^). If all are 
non negative, then set tt^ = -k^, and the optimisation over 
is achieved. If not, choose j such that [(ij(\l/„)] (vr-^) < 0, and set 
S' = S + {j}. For example, one can take j as the minimizer of 
[dji'^n)] (tt^). Go to step 2. 

Step 2: Let tt^, be the minimizer of \E'n(7r) over all measures tt such 
that Supp(7r) C S'. Two cases must be considered: 
(a) If for all / G 5", tt^,^ ^ 0, then set = Hg,, S = S' and 
return to Step 1. 
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(b) If not, let / be defined as follows: 



/ = arg min < sj' 

■i' I 




for J such that tt^j, < tx^. 



Set S' = S + {]} 



{/} and return to Step 2. 



Theorem 5. The estimator vr^ given by the algorithm described above min- 
imizes \l/n(7r) over tt G Ai^ . 

Then, thanks to the following theorem, we are able to calculate a conve- 
nient L. 

Theorem 6. Let L ^ + 1. If is a probability measure, then tt^ = n^- 

One possibility is to carry out the optimisation over J\A.^ for increasing 
values of L until the condition X]j>i Trj" = 1 is satisfied. As the support of 
TTn is finite, the condition will be fulfilled in a finite number of steps. 

4. Simulation study 

4-1. Simulation design 

We designed a simulation study to assess the quality of the constrained 
estimator pn and to compare it with the unconstrained estimator pn- 

We considered two shapes for the distribution pq\ the geometric ^(7) 
(7 = .9, .5, .1), the support of which is infinite, and the pure triangular dis- 
tribution Tj [j = 20, 5, 2). For each distribution, we considered three sample 
sizes: n = 10, 100 and 1000. We also considered the Poisson distribution 
with mean A, which is convex as long as A is smaller that A* = 2 — a/2 ~ .59. 
We considered A = .59, .8 and 1. For each simulation configuration, 1000 
random samples were generated. The simulation were carried out with R 
(www.r-project.org), using functions available at the following web-site 
http : //w3 . jouy . inra. fr/unites/miaj /public/per so/SylvieHuet_en. html. 
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4.2. Global fit 

We first compared the fit of tfie estimated distribution pn and Pn to the 
entire distribution pq. To this aim, for each simulated sample, we computed 
the ^2-loss for p„ 



and likewise for p„. The expected £2-loss is estimated by the mean calculated 
on the basis of 1000 simulations and the results are displayed in Figure [T] 

As expected from Theorem [21 the constrained estimator pn outperforms 
the empirical estimator in all configurations in terms of ^2-loss. The difference 
is larger in the triangular case because of the existence of a region where po 
is linear. The empirical estimator pn gets better and closer to Pn as the true 
distribution po becomes more convex, i.e., for 7 = .9 or j = 2. Note that the 
fit of the unconstrained estimator improves when the true distribution gets 
more convex. 

These results are theoretically grounded by Theorem |2] for the £2-loss, but 
we also considered the Kolomogorov loss: 



where Pq is the true cumulative distribution function (cdf) and P„ is the 
constrained cdf. The Kolmogorov loss of the empirical cdf Pn was calcu- 
lated in the same way. As shown on Figured] (bottom), the behavior of the 
Kolmogorov loss is similar to that of the ^2-loss. The same behavior was 
observed for the Hellinger loss: 



(results not shown). We thus observed that the constrained estimator p, 
outperforms the empirical estimator for all considered losses. 




K{p,Po) = sup \Pn{i) - Po{i)\, 




and the total variation loss: 
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Figure 2: Variance. Relative standard error of the variance as a function of the sample 
size n. Same legend as Figure [TJ 

4-3. Some characteristics of interest 

In this section, we consider the estimation of some characteristics of the 
distribution, namely the variance, the entropy and the probabihty at 0. For 
each of these characteristics, denoted 6{p), we measured the performance in 
terms of relative standard error: 

^JE{e{pn)-e{po)f/e{po). 

The expectation was estimated by the mean over 1,000 simulations. 

As shown in Section [2l the means of the empirical and constrained dis- 
tributions are equal, whereas the variance of the constrained distribution is 
larger than the variance of the empirical one. Denoting by fik the centered 
moment of order k of po, the mean and variance of the empirical variance are 
respectively 

Tl — 1 71 — 1 

fi2 and — 5— ((n - l)/i4 - (n - S)^^^) . 

n n-^ 

Figure [2] shows that the relative standard error of the constrained estimator 
is smaller than that of the empirical one. Hence, the constrained variance 
turns out to be more accurate. 

We also investigated the estimation of the entropy 

i>0 
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Figure 3: Entropy. Relative standard error of the estimated entropy estimators as a 
function of the sample size n. Same legend as Figure [TJ 

which is often used in ecology as a diversity index. As shown in Figure El 
H{pn) is a better estimate of the true entropy than H{pn), in most situations; 
the difference between the two estimators vanishes when the true distribution 
becomes more convex. The worst performance of H{pn) are obtained when 
the true distribution is T2. Note that this distribution is a special case since 
more than half of the estimation errors consist in adding a component Tj 
{j > 2) in the mixture (JTj), which result in an increase of the entropy. 

We then considered the estimation of the probability mass p{0). Theorem 
|3] showed that the constrained estimator Pn(0) is greater than or equal to the 
empirical estimator Pn{0), which is known to be unbiased. However, Figure 
m shows that the constrained estimator pn still provides a more accurate 
estimate of po{0) than pn. 

For all these characteristics, the constrained distribution provides better 
estimates than the empirical distribution, provided that the true distribution 
is indeed convex. 

4.4- Robustness to non- convexity 

We finally studied the robustness of the constrained estimator to non- 
convexity. As an example, we considered the Poisson distribution with mean 
A, which is convex as long as A is smaller that A* = 2 — ^ .59. We studied 
how Pn and Pn behave, in terms of i'2-loss, when A exceeds A*. 
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The left panel of Figure \5\ displays the Poisson distributions with re- 
spective means A*, .8 and 1. Figure E] (right) shows that the ^2-loss of the 
constrained estimator increases with A. However for small sample sizes, pn 
still provides a better fit than pn, at least for A ^ 1. The performance oipn is 
dramatically altered when the sample size becomes large and the convexity 
assumption is strongly violated. 

5. Proofs 

5. 1 . Proof of Theorem [I] 

In order to prove Theorem [1], we first prove in the following lemma that 
the minimizer of Qn over /C exists and is unique, where /C is the set defined 
in Section [TJ Then, after some intermediate results, we prove in Lemma [3] 
below that the minimizer of Qn over /C belongs to C. Since C C /C, Theorem 
[1] follows from Lemma [1] combined to Lemma [31 

Notation. We denote by the number of distinct values of the Xj's and by 
. . . , X(7v^) these distinct values rearranged in increasing order, i.e., such 
that < ■ ■ ■ < X(Ar„). We set rvi = and s„ = X(^Nr,)- 

In the case s„ = i.e., p'n(O) = 1 and p„(^) = for all ^ ^ 1, the proof of 
Theorem [T] is straightforward. Thus, in the sequel, we restrict ourselves to 
the case ^ 1. 

Lemma 1. There exists a unique Pn E IC such that 



Moreover, p„ has a finite support. 

Proof.. For proving the existence and uniqueness of Pn, we have to prove the 
following preliminary results, where q denotes a candidate to be a minimizer 
of Qn over /C. 

(i) There exists Ci = Ci{uj) < oo that does not depend on q such that 
q ^ Ci. 

(ii) We have q = q where 



Qn{Pn) = inf. Qn(p)- 



(11) 




(12) 
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Therefore, minimizing Qn over /C amounts to minimizing over the set 
of functions q ^ K, such that g ^ Ci, Qnil) ^ QniTi), and q = q. But for all 
g G /C such that q = q, we have 

1 1 
= 2 5^ 9^(0 + 2 (max{g(s„) + 

and therefore, this amounts to minimizing 

over the set K of non- increasing convex functions t : {0, . . . , s„} — [0, oo) 
such that t(0) ^ Ci and Qn(t) ^ Qn(Ti). The set is compact and Qn is 
continuous and strictly convex on K, so there exists a unique minimizer of 
Qn over K. This proves that there exists a unique minimizer of Qn over /C. 
It remains to prove results (i) and (ii). 

Proof of (i).. It is easy to see that for all p G /C, 

using that p is non- increasing. This lower bound tends to infinity as p(r'„) — )■ 
oo. But, if we consider Ti the measure that puts the mass 1 in 0, we have 
Qn{q) ^ Qn{Ti) < oo, so there exists c < oo such that g(r„) < c. Now, 
Qn(Ti) ^ Qn{q) ^ ?^(0)/2 — g(r„) and therefore, there exists ci < oo such 
that g(0) ^ Ci, which means that q ^ Ci. 

Proof of (ii).. By convexity we must have q{i) ^ q{i) for all i ^ 'Sn and 
therefore, 

Qn(g) - Qn{q) = - fi^)/^ > 

with a strict inequality in the case q ^ q. This proves that any candidate q 
to be a minimizer of Q„ over /C should satisfy q = q- 

Let us now prove that the support of p„ is finite. In the case Pn(sn) = 0, it 
is clear that pn has a finite support included in {0, . . . , s„ — 1}. Consider the 
case Pn{sn) > 0. Let us first remark that p„,(S'„ — 1) > p„(s„,), since otherwise, 
we would have Pn{i) = Pn(sn) for all z ^ so that Qnipn) = oo. Then define 
q as in (fT2|) where g is replaced by Pn- From the proof of Lemma [H we know 
that Pn = q which has finite support as soon as Pn(sn ~ 1) > Pn(sn)- □ 



- q{sn - 1), 0})^-^ q{i)Pn{i) 

(13) 
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. The following lemma provides a precise chara cterization of Pn. It is the 
counterpart, in the discrete case, of Lemma 2.2 in iGroeneboom et al.l (120011 ) 
for the continuous case. For every p e /C, we define 

i=0 i=0 

for all integers j ^ 0, and Fp{j) = Hp{j) = for all integers j < 0. Thus, Fp 
is a distribution function in the case p & C. 

Lemma 2. Let Pn be the unique function in K, that satisfies (fii]). For all 

I ^ 1 we have 

HpSl - I) HpSl - I) (15) 
with an equality if Pn has a change of slope at point I, i.e., if 

Pn{l) -Pn{l - 1) < Pn{l + 1) - Pn{l) ■ 

Conversely, if p E IC satisfies Hp{l — 1) ^ Hp^{l — 1) for all I ^ 1 with an 
equality if p{l) — p{l — 1) < pif + 1) — p(Oj ^^^^^ P = Pn- 

Proof.. First, note that pn has a finite support by definition, and Lemma H] 
ensures that pn has a finite support as well. Thus, all the sums involved in 
the proof are well-defined and finite. For every e > and 1^1, define q^i by 
Qei{i) = Pnij) for alH ^ / and 

Qelii) =Pn{l) +€{l-i) 

for alH G {0, . . . , /}. Thus, g^; is the sum of convex functions, which implies 
that g^/ e JC for all e,l. Since Pn minimizes Qn over /C, we have Qn{lei) ^ 
Qn{Pn) for all e,l and therefore, 

\immi-(Qn{qsi) - Qn{Pn)) ^ 
for all / ^ 1. This simplifies to 

^Pn{i){l -i)^ ^p„,(2)(/ - i) 

i=0 2=0 
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for all / ^ 1 and can be rewritten as 

i-i j i-i j 

j=0 i=0 j=0 i=l 

for all / ^ 1, which is precisely f|T5|l . To prove the equality case, note that 
(1 + e)pn G /C for all e > —1. Therefore, for all e > — 1 we have 

(5„((1 +e)pn) ^ QniPn)- 

Distinguishing the cases e > and e < we obtain 

liminf ^(Q„((l + e)pn) - Qn{Pn)) > 
and ^ 

limSUp -(Qniil + €)pn) - QniPn)) ^0. 

Both limits are equal, so their common value is equal to zero, which can be 
written as 

^Pn{i){Pn{i) -Pn(0) = 0- 

Now, noticing that p{i) = Fp{i) — Fp{i — 1) for all p G /C and i G N, we arrive 

at 



= Y,Pn{^ {Fpni^) - FpA^) - Y^Pni^ [FpS - 1) - FpS^ - 1)) • 



i>0 i^l 

Rearranging the indices, we have 



Y,Ui){FpS^ - 1) - FpS^ - 1)) = + - FpSi)). 



whence 



= E {pni^-M^ + 1)) {Fp,X^) - Fp,x^)). 
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Now, we notice that Fp{i) = Hp{i) — Hp{i — 1) for all p G /C and i G N. A 
similar change of indices as above then yields 

= E ((^"(^) - P-^' + - (^"(^ + 1) - PrX^ + 2))) - • 

It follows from ( IT5!) that Hp^{i) ^ Hp^{i) for all i ^ 0, and we have 

+ 1) - Pn(0 ^ Pn(« + 2) - + 1) 

by convexity of A sum of non-negative numbers is equal to zero if and 
only if these numbers are all equal to zero, so we conclude that 

{{Pni^ - Pn{l + 1)) - {Pn{l + 1) " Pn{l + 2))) (z) - %„(z)) = 

for all z ^ 0. Hence, Hp^{i) = Hp^{i) for alH ^ that satisfy 

Pn{i + 1) - Pn{i) < Pn{i + 2) - + 1). 

Setting / = z + 1, this means that we have an equality in f|T5|) if p„ has a 
change of slope at point /. 

Conversely, consider p G /C such that Hp{i) ^ Hp^{i) for all z ^ with 
an equality if p(z + 1) — p{i) < p{i + 2) — p{i + 1). Then we have 

= Yl (P(^) - + l)+p{^ + 2)) {Hp{z) - Hp^ii)) (16) 

and p has a finite support. To see this, argue by contradiction and assume 
for a while that the support of p is not finite. In such there exists an 

increasing sequence {ui)iiz^ such that ui tends to infinity as / — )■ oo and p has 
changes of slope at every point m/ + 1, / G N. This implies that 

Hp{ui) - Hp{ui^i) = Hp^{ui) - Hp^Xui^i) 

for all / ^ 1. Using that Fp is non- increasing and that Pn has a finite support, 
we obtain 

Fpiui^i) ^ ^ iHp{ui) - Hp{ui^i)) = Fp^Xui) = y]pn(«) 

111 7/7 1 ' 
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for all large enough / and similarly, 

for all large enough /. Therefore, 

Fp{ui) = Fp{ui^i) = y^pn(0 

for all large enough /, which means that p{i) = for all large enough i. This 
is in contradiction with the assumption that the support of p is not finite, 
which proves that the support of p is finite. 

Now, let g e /C be any candidate to be a minimizer of Qn over /C. We 
know, see the proof of Lemma [1], that Qn{,<i) ^ QniTi), and q = q, where q 
is defined by (IT^ . In particular, q satisfies (IT^ which implies that q has a 
finite support. Thus, we can write 



Qn{q)-Qn{p) = ^5^(g^(z)-p^(z)-2p„(z)(g(z)-p(z))^ 

> (17) 



i>0 



Using that both q and p—pn have a finite support and rearranging the indices 
as above, we show that 

'Y{p{i) - Pn{i)) {q{^ - p{i)) 

= E ((^(^) - - + 1) - Pi' + 1)) + i^i' + 2) - + 2))) (Hpit) - Hp^ii)). 
Combining this with f lT6|) and f fT7|) yields 

Qn(g) - Qn{p) ^ Yl (^(') - + 1) + ^('^ + 2)) (^p(0 - 
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The right-hand side is non-negative since Hp{i) ^ Hp^{i) for all i ^ and 
q is convex over N, so we conclude that Qn{(l) ^ Qn{p) for all candidates 
g G /C. This means that p minimizes Qn over /C. □ 

We are now in a position to prove that pn is a probability mass function, 
i.e., Pn e C. 

Lemma 3. Let pn be the unique function in /C that satisfies 177]) . We have 

Fp^{Sn + l) = Fp^{Sn + l), (18) 

^ s„ and Pn G C. 

Proof.. Let us first prove by contradiction that s"„ is well-defined. Let k = 
1 + miuj {pn{j) 7^ 0}. It is easy to verify that there exists a strictly positive 
a such that Qn{aTk) < 0. As Qn{0) = 0, Pn cannot be identically zero and 
s~„ is well-defined. 

By definition of Pn has a change of slope at point s^, + 1, so it follows 
from Lemma |2] that 

•5ti Sri 

j=0 j=0 

Using Lemma |2] again we obtain 

Sn + l Sn + 1 

j=0 i=0 

which, combined with ( !T9|) shows that -Fp„(s"n + 1) ^ -^Pn(^n + !)• 
Let us first consider the case where ^ 1. We have 

i=o i=o 

which, combined with (IT^ shows that ^ But p„(s^„ + l) = 

by definition of s„, so we also have + 1) = and therefore, 

i^?„(?n) > Fp„{Sn + FpSSn + !)• 

By definition, Fp^ is non- decreasing, so we conclude that f ITH]) holds. 
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Consider now the case s^^ = 0. We have P'„(l) = 0: otherwise, we could 
modify p„ to a g G /C such that g(0) = p„,(0), < g(l) ^ Pn{^) and q{i) = 
for all z > 1, which is a contradiction since for such a g we have Qn{(l) < 
Qnipn)- Moreover, in the case = 0, we have p„(0) = pvi(O): otherwise, we 
could modify p„ to a g G /C such that g(0) = Pn(0) and q{i) = for alH > 
which is a contradiction since for such a g we have Qn{(l) < Qn{Pn)- Hence, 

i^P„(l)=Pn(0)=p„(0) = F^„(l), 

which completes the proof of (ITS]) . 

For the purpose of proving that ^ s„, we argue by contradiction. 
Assume for a while that s'„ = — 1. This means that Pn{i) = for all i^'sn 
and p„(s„ — 1) > 0. In this case, we can modify p„ to a g G /C such that 
q{i) = Pn{i) for all z < < qi^n) ^ Pn(sn), and g(«) = for all i > 'Sn- 
Then we have 

2(<5n(g) - <5n(Pn)) = ^ (g(i) - ^ - ^ (p„(i) - Pn(0) ^ 

= {(l(Sn) -Pn(Sn)Y " (Pn(Sn))^ 

< 0. 

This is a contradiction since p„ minimizes Q„ and therefore, s"n 7^ ~ 1- 
Assume now that s„ < s„ — 1. Then, -Fp„(s~„ + 1) < 1, so ffTSl) yields 

for all j ^ s„ + 1. Therefore, for all / > s„ we have 

I — I Sn — l 

E - = E - +{1- Sn){FpASn + 1) - l), 

i=o j=o 

which tends to — oo as / — oo. This is a contradiction since from Lemma 
m this has to remain non-negative for all /. We conclude that s~„ ^ s„. 
Combining this with f llSp yields 

This proves that p„ is a probability mass function and completes the proof 
of the lemma. □ 
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5.2. Proof of Theorem\^ 

Let us begin with the following lemma that gathers together a number 
of properties of the minimizer These properties compare to those of 
the constr ained least squares estima tor of a convex density function over 



[0, cxo), see iGroeneboom et al.l (|2001[ ): in this case the constrained LSE has 
a bounded support, is piecewise linear, has no changes of slope at the obser- 
vation points, and has at most one change of slope between two consecutive 
observation points. In the discrete case, the constrained LSE is also piecewise 
linear with bounded support. However, due to the fact that N is a discrete 
set, the constrained LSE can have changes of slopes at the observation points 
and can have two changes of slopes between two consecutive observations. 

Lemma 4. The unique function p„ G /C that satisfies ( 177]) has the follow- 
ing properties: Pn is linear on the interval {0, + 1} and also on 
{sn — l;---;?n}/ the case where Nn, the number of distinct values of 
the Xi 's, is greater or equal to 2, it has at most two changes of slopes on 
{Xq), . . . , for any given j = 1, . . . , Nn — 1, and in the case where 
it has two changes of slopes on this set, these changes occur at consecutive 
points in N. 

Proof. . We know, from the proof of Lemma [1], that p„ = q, where q is de- 
fined as in fll2p where q is replaced by Pn- It follows that pn is linear on 
{sn — 1, . . . , s~ri} in the case ^ s„,. Consider an arbitrary candidate p to be 
a minimizer of Q„ over /C, fix j e {1, . . . , Nn — 1}, and define the functions pi 
and Pr over N as follows: pi{i) = p{i) for all i ^ + 1 and all i ^ -^(j+i) and 
pi is linear over ),..., — 1}, whereas Pr{i) = p{i) for all i ^ 
and all i ^ ^o+i) — 1 and Pr is linear over + 1, Setting 

q{i) = ma.x{pi{i) , Pr{i)} for all z G N, we obtain that g G /C is piecewise lin- 
ear over {^(j), with at most two changes of slopes over this interval 
and in case it has two changes of slopes, these changes occur at consecutive 
points. We have = p(X(j)) for all j, and q ^ phj convexity of p. Since 

Pn{i) > if and only ii i = for some j, this implies that Qn{(l) ^ Qnip) 
with a strict inequality if p ^ q. Therefore, p could be a minimizer of Qn 
only if p = q. This implies that the minimizer pn is piecewise linear over 
. . . with at most two changes of slopes over this interval. A 

similar argument shows that pn is linear over the interval {0, . . . , + 1}. □ 
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Proof of Equation ^ 

We prove that ([3]) holds with po replaced by any q E K. that belongs to 
I2, i.e., that satisfies J2j^ol'^U) < o^- Since po belongs to /i as a probability 
mass function and I2 C h, po also belongs to I2, so ([3]) with po replaced by 
any g G /C that belongs to I2 is a slightly more general result than ([3]). 

Consider an arbitrary q E IC satisfying Xlj^o^^O) ^ have 

5^ ^ 5^ (g(j)-Pn(j))'+2 5Z (P„(j)-Pn(j))to-Pn(j)) 

i^o j^o j^o 

with a strict inequality in the case where Pn is non-convex since in that case, 
Vn 7^ Vn- Thus, in order to prove that ^ holds with Pq replaced by g, it 
suffices to prove that 

- PnU)) {qU) - PnU)) > 0. (20) 

According to Lemma IH there exist integers Cq < ■ ■ ■ < such that Cq = 0, 
Cm = s'n + 1, P„ is linear over the interval {cj_i, . . . , q} and has a change 
of slope at point Cj, for all z = 1, . . . , m. It follows from Theorem [1] that 
s'n ^ 'Sn, SO P'n(j) = Pn(j) = for all j ^ + 1 and the sum in f l20|) can be 
split as follows: 

m Ci—1 

E - - = E E (^"(^■) - ^n(j))/(j) (21) 

j>0 i=l i=Ci_i 

where /(j) = g(j) — p„(j) for all j ^ 0. For alii = 1, . . . , m we have 

E (Pn(j)-Pn(j))/(J) 

i=Ci-i 

Ci-l 

i=ci-i 

j=c,-i i=cj_i-i 

= E {FpAj)-Fi^M(f(j)-fU + ^)) 

j=Cl-l 

+ (Fj,„(c,-l)-F^„(c,-l))/(c,) 

- (Fj,„(ci_i - 1) - F^„(q„i - 1))/(q_i), 
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where Fp^ and Fp^ are defined in ( IT^ . By definition, Fp^^j) = ^PnU) = 
for all j < Co, so summing up over i yields 

j=i j=Ci_i j=i i=Ci_i 

+ (Fp„(c„, - 1) - Fp„(c,„ - l))/(c,„), 

where we recall that Cm = s'„ + 1. Now, it follows from the definition of 
s'n that Pnis'n + 1) = and we also have Pnis'n + 1) = since 'Sn ^ s„, see 
Theorem [U Thanks to f lTg]) . we conclude that = Fp^(s'„). Therefore, 

021]) combined with the preceding display yields 



m Ci — 1 

E E {^pM)-F^MifU)-fU + ^)) 



1 = 1 j = Ci-l 



Now^%„(j) = Hp^ij) = for all j < Co and Fp{j) = Hp{j) - Hp{j - 1) for 
p = Pn,Pn and all j, so we can repeat the same arguments as above to obtain 

m Ci — 1 

i=l i=Ci_i 

m Ci~l 

= E E i^p^i^) - uxj) - vij + 1) + /(j + 2)) 

i=l j=Ci_i 

+ (ifp„(c^ - 1) - Hp^iCm - l)){f{Cm) - f{c^ + l))- 

Since pn has a change of slope at each q, we deduce from Lemma [2] that 
Hp^{ci — 1) = Hp^{ci — 1) for alH = 1, . . . , m and we arrive at 

^{PnU) - PnU)) - PnU)) 

c,-2 (22) 
= E E - ^pM (/(j) - 2/(j + 1) + fU + 2)) , 

i j=Ci_i 

where the first sum on the right-hand side is taken over those i = 1 , . . . , m 
such that Cj_i ^ Cj — 2. For such an i, f is convex over the interval {cj_i, . . . , q} 
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as a sum of a convex function and a linear function (recall that by definition 
of the Cj's, Pn is linear over such an interval). Therefore we get 

/(j)-2/(j + l) + /(j + 2) ^0 

for all j = Ci-i, . . . , Cj — 2, see (E]). Moreover, it follows from Lemma |2] that 
Hp^^ ^ Hp^^, which leads to 

(Hp^U) - HpSj)) ifU) - 2/(j + 1) + /(j + 2)) ^ 



for all j = Ci-i, . . . , Cj — 2. Combining this with fl22|) yields fl20|) and completes 
the proof of the first part of the theorem. 

Proof of Equations and ([^ 

It suffices to prove (jl]) since the second assertion follows from (jl]) and El 
To prove dl]), note that 

F(^Pn is non-convexj ^ P (^„(z) - 2pvi(z + 1) + p„(« + 2) < 0) 

and that by assumption, we have po{i) — 2po{i + l) +po(^ + 2) = 0. Therefore, 
we have the following inequality: 



Fypn is non-convexj 

^ P (v^[(p„(z) - Po{l)) - 2{pn{l + 1) - Po{l + 1)) + (Pn(^ + 2) - po{l + 2))] < o) 

From the central limit theorem, the random variable 

{Pn{i) - Po(0) - 2(Pn(« + 1) - Po(^ + 1)) + {Pn{i + 2) - Po(« + 2)) 



converges, as n — j- oo, to a centered Gaussian variable X with a non- 
degenerate variance and therefore, 

lim inf P (pf„is non-convex) ^ P(X ^ 0). 
The lemma follows since P(X ^ 0) = 1/2. □ 
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5. 3. Proof of Theorem 

Let us first note that for any positive concave function q defined on M, 
such that qi^n) > and q{i) = for all i > s„,, the function pn — eq belongs 
to /C as soon as e ^ Pnis'n) / qis'n) ■ 

Besides, thanks to Theorem [T], we know that Pn = Argmin^g^Qnl/)- 
Therefore for all q defined as above, 

< lim ~ ~ Qn{Pn) 



e\0 e 

Sn 



1=0 

Let M ^ 1, ^ a ^ and take 



and q{i) = for z > Then we get the inequality in Equation ([6]). 

The proof of X]i=i^Pn(0 — Si=i^Pn(0 follows from the fact that the 
function pn + eq belongs to /C for 

q{i) = I 1 — — j for 1 ^ z ^ s'n, and q{i) = 0, for i > 

It remains to prove that Pn(0) ^ Pn(0). Argue by contradiction and assume 
that p„.(0) < Pn{0). Define g(0) = Pn(0) and q{i) = Pn{i) for all i ^ 1. Then, 
q E JC since pn is convex and g(0) ^ P„(0), and we have Qn{q) < Qn{Pn)- 
This is a contradiction since p„ minimizes Qn over /C, see Theorem [TJ This 
completes the proof of the theorem. 

5.4- Proof of Theorem^ 

Assume / G /C and consider the function vr defined by (IH]). The function 
vr takes non-negative values since / is convex, see ([2]). Therefore vr belongs 
to Ai. Moreover, for alH G N we have 

J2 ^.^.(0 = E (/(^' + 1) + /(^' - 1) - 2/oo)(j - 
= E E(/(^' + i) + /(^'-i)-2/(j))- 

j^i+l k=l 
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Since all terms in the sum are non-negative and lim 
write 



, f{i) = 0, we can 
) + /(j-l)-2/(j)), 



j>i+l k^l j^i+k 

= 5^ (/(^ + A: - 1) - /(z + A;)) 

= m 

for all i G N. Therefore, vr G satisfies ([7]). Conversely, every / : N — >■ 
[0, oo) satisfying ([7]) for some vr G is clearly convex, so we obtain the first 
assertion of the theorem. To prove the second and the third assertions, we 
assume that / G /C. So, in view of the preceding result, we know that / 
satisfies ([7]) for some vr G A^. Thus, we have 

f{^-l)-f(^) = ^vr,.(T,.(^-l)-T,(^)) 



E 



for alH ^ 1. By convexity of / we conclude that 

0^(/(z-l)-/(z))-(/(z)-/(z + l))= ^^^^^^ 
for alH ^ 1, which implies that vr is uniquely defined by ([8]). Moreover, 

oo oo i^i 

since Tj{i) = for all j ^ z. This implies that 

oo j^si oo 

where ^j>o^j(0 = 1- This completes the proof of the theorem. 

5. 5. Proof of Theorem 



□ 



The theorem is proved following the work of Groeneboom and al. lGroeneboom et al 



(120081 ). It follows from Lemmas [5] and [6] given below. 

Lemma 5. Let Sn he the maximum of the support of pn (md L ^ 5„ + 1. 
Then we have the following result: tt^ = argmin^g_v(i "^nif^) is equivalent to 

[dji^n)] (tt^) ^ VI ^ J ^ L, and (tt^) = Vj G SuppiTr"^) (23) 
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Proof.. Let n'^ = argmin^^j^L 

For all 1 ^ j ^ L and e > 0, n^ + eSj E M^, and ^'„(7f^ + ^ ^'„(7r^). 
It follows that [dji^n)] (vr^) > 0. 

If j G Supp(7f^), then for e > small enough, tt^ — G A^"^, and 
^„(7r-^ - e5j) ^ ^„(7r^). It follows that - [dj{^n)] (t?^) ^ 0. 

Conversely, assume that Equation fl25]l is satisfied, and take n E Ai^. 
Then [Z}^(\E'„)] (vr^) is non negative and [D^^L^'^n)] (tt"^) = 0, thanks to Equa- 
tion (JTUD. 

By convexity of for £ > 

Taking the limit when e tends to 0, we get 

v&„(7r) - Vl>„(7f^) ^ [D_^.(vl>„)](7f^) 

□ 



Lemma 6. Let us define the following quantities. 

• Let 71 = Xli^i^ ^i^ji minimizer of \E'„ over the set of positive 
measures spanned by {Sj^, l^z^L — 1}. 

• Let ji be an integer such that Jl ^ ji for all i = 1, . . . , L — 1, and 

(7r)<0. 

• Let Tx* = Xlili ^i^ji minimizer of over the set spanned by 

Then hi > 0, and there exists e > such that n + e{n* — n) is a non negative 
measure, and such that \E'n(7r + e{n* — n)) < ^'n(7r). 

Proof. . Following the same arguments as in the proof of Lemma |5l we have 
[djX^n)] (vr) = for alH = 1, . . . , L - 1. Then 

L-l 

[D^i^n)] W = 5^a. [d,X^,,)] (7r) = 0. 

i=l 
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Moreover, we have 

^'„(7r + e5,,) = ^„(7r) + - + £ [dj^{^n)] (vr). 
Therefore, [djj^{^n)\ (tt) < imphes that for e > small enough, 

This shows that tt* 7^ vr. 

By convexity of we show that 



£4,0 £ £4-0 e 

= ^„(7r'^) - ^„(7r) < 0. 

This shows that for e > small enough, \E'„(7r + eiji* — tt)) < \E'„(7r). 
Besides, we have 

lim-($,((l-£)7r + £7r*)-*,(7r)) = [Z},._^(*„)] W 

L 

j=i 

= fe^K,(vI;„)](^). 

Because [(iji(^E'n)] (tt) < 0, bi is positive. 

It remains to show that there exists e > such that, for all 1 ^ i ^ L — 1, 
aj + e{bi — tti) is non negative. This is clearly the case if 6j ^ a^. If not, take 
e ^ minb^<„^ {cj/ (a^ - 6^)}. □ 

5. Proof of Theorem 

Let us begin with the following lemma. 

Lemma 7. Ifn^ = argmin^g_A4i ^n(/^); ^^en for all j ^ 1, 



[d^+,(vl/„)] in' 



for some positive constant b depending on j and on the maximum of the 
support ofn^. 
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Proof.. Let us consider two cases according to wether equals or not. 
Suppose that vff > 0, and write 



L+j-l 

E ( E ^rTAi)-Pn{i) 

\j'=i+i 



1=1 



1=1 \j'=i+i / 



Because for ^ / ^ L — 1, Tl^j^I) = aTAl) + 6, for constants a and h 
depending on L and j, we get 



E E 

,1=1 j'=i+i 



Following Lemma O [(iL(\E'„)] {n^) = 0, and we get 



ML+,(*n,)] (^") = M E E ^^(0 - 1 = M E - 1 • 



, i=l Z=0 



If vrf; = 0, then tt^ G Ai^^ for some Li < L. Thanks to Lemma O we 
know that tt^ is the minimizer of \E'„ over Ai^^. Then we can show that 
[(iLi+j(\E'„)] (tt^) = for all j ^ 1 exactly as we have done in the case 7r|; > 0. 

. To conclude the proof of Theorem [6l note first that for all L' ^ L, we have 
M.^' C M^, which implies 



(24) 



Second, it follows from Lemmas [5] and [7] that if Xlili ~ then for all 
L' ^ L, 7?^ =TT^. Therefore Equation ([21]) holds for all positive integers L', 
which implies that 

for all measures tt E M. with a finite support. Therefore tt^ = TTn- □ 
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